LIMSI's participation to the 2013 shared task on Native Language Identification
نویسندگان
چکیده
This paper describes LIMSI’s participation to the first shared task on Native Language Identification. Our submission uses a Maximum Entropy classifier, using as features character and chunk n-grams, spelling and grammatical mistakes, and lexical preferences. Performance was slightly improved by using a twostep classifier to better distinguish otherwise easily confused native languages.
منابع مشابه
The Story of the Characters, the DNA and the Native Language
This paper presents our approach to the 2013 Native Language Identification shared task, which is based on machine learning methods that work at the character level. More precisely, we used several string kernels and a kernel based on Local Rank Distance (LRD). Actually, our best system was a kernel combination of string kernel and LRD. While string kernels have been used before in text analysi...
متن کاملMaximizing Classification Accuracy in Native Language Identification
This paper reports our contribution to the 2013 NLI Shared Task. The purpose of the task was to train a machine-learning system to identify the native-language affiliations of 1,100 texts written in English by nonnative speakers as part of a high-stakes test of general academic English proficiency. We trained our system on the new TOEFL11 corpus, which includes 11,000 essays written by nonnativ...
متن کاملA Report on the First Native Language Identification Shared Task
Native Language Identification, or NLI, is the task of automatically classifying the L1 of a writer based solely on his or her essay written in another language. This problem area has seen a spike in interest in recent years as it can have an impact on educational applications tailored towards non-native speakers of a language, as well as authorship profiling. While there has been a growing bod...
متن کاملNAIST at the NLI 2013 Shared Task
This paper describes the Nara Institute of Science and Technology (NAIST) native language identification (NLI) system in the NLI 2013 Shared Task. We apply feature selection using a measure based on frequency for the closed track and try Capping and Sampling data methods for the open tracks. Our system ranked ninth in the closed track, third in open track 1 and fourth in open track 2.
متن کاملUsing N-gram and Word Network Features for Native Language Identification
We report on the performance of two different feature sets in the Native Language Identification Shared Task (Tetreault et al., 2013). Our feature sets were inspired by existing literature on native language identification and word networks. Experiments show that word networks have competitive performance against the baseline feature set, which is a promising result. We also present a discussio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013